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Abstract: LRRs (leucine rich repeats) are present in over 14,000 proteins. Non-LRR, island 
regions (IRs) interrupting LRRs are widely distributed. The present article reviews 19 
families of LRR proteins having non-LRR IRs (LRR@IR proteins) from various plant 
species. The LRR@IR proteins are LRR-containing receptor-like kinases (LRR-RLKs), 
LRR-containing receptor-like proteins (LRR-RLPs), TONSOKU/BRUSHY1, and MJK13.7; 
the LRR-RLKs are homologs of TMK1/Rhg4, BRI1, PSKR, PSYR1, Arabidopsis 
Atlg74360, and RPK2, while the LRR-RLPs are those of Cf-9/Cf-4, Cf-2/Cf-5, Ve, HcrVf, 
RPP27, ELX1, clavata 2, fascinated ear2, RLP2, rice Osl0g0479700, and putative soybean 
disease resistance protein. The LRRs are intersected by single, non-LRR IRs; only the 
RPK2 homologs have two IRs. In most of the LRR-RLKs and LRR-RLPs, the number of 
repeat units in the preceding LRR block (Ni) is greater than the number of the following 
block (N 2 ); Ni » N 2 in which TV; is variable in the homologs of individual families, while N 2 
is highly conserved. The five families of the LRR-RLKs except for the RPK2 family show 
Ni = 8 - 18 and N 2 = 3 - 5. The nine families of the LRR-RLPs show Ni = 12 - 33 and N 2 
= 4; while Nj = 6 and N 2 = 4 for the rice Osl0g0479700 family and the Nj = 4 - 28 and N 2 
= 4 for the soybean protein family. The rule of Nj » N 2 might play a common, significant 
role in ligand interaction, dimerization, and/or signal transduction of the 
LRR-RLKs and the LRR-RLPs. The structure and evolution of the LRR domains with 
non-LRR IRs and their proteins are also discussed. 
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1. Introduction 

LRR (leucine rich repeat) regions are present in over 14,000 proteins in the data bases-PFAM, 
SMART, PROSITE, and InterPro [1-4]. LRR-containing proteins have been identified in viruses, 
bacteria, archaea, and eukaryotes. Arabidopsis thaliana and Oryza sativa subsp. japonica (rice) contain 
over 700 and 1,400 LRR proteins, respectively [5]. Most LRR proteins are involved in protein-ligand 
and in protein-protein interactions; these LRR proteins include plant immune response and mammalian 
innate immune response [6-10]. Most LRR repeating units are 20-30 residues in length. All LRR units 
can be divided into a HCS (highly conserved segment) and a VS (variable segment). The HCS part 
consists of an 11 residue stretch, LxxLxLxxNxL, or a 12 residue stretch, LxxLxLxxCxxL, in which 
"L" is Leu, lie, Val, or Phe, "N" is Asn, Thr, Ser, or Cys, and "C" is Cys, Ser or Asn [7,1 1-14]. Eight 
classes of LRRs have been characterized by different lengths and consensus sequences of the VS part of 
the repeats. They are "Rl-like", "CC", "Bacterial", "SDS22-like", "plant specific (PS)", "Typical", 
"TpLRR", and "IRREKO". Plant specific LRRs (class: PS-LRR) are 23 to 25 residues long and 
contain a conserved consensus sequence of the VS part, SGxIPxxLxxLxx, in which "S" is Ser or Thr, 
"G" is Gly or Ser, "I" is lie or Leu, and "L" is Leu, He, Val, Phe, or Met, and "x" is any amino acid [14]. 
The structures of polygalacturonase inhibiting protein (PGIP) and brassinosteroid insensitive 1 (BRI1), 
which have PS-LRRs, are available [15-17]. 

LRR-containing proteins from plants have diverse overall structures and functions. Several classes 
contain LRR-containing receptor-like kinases (LRR-RLKs) [18,19], LRR-containing receptor-like proteins 
(LRR-RLPs) [20], nucleotide binding site LRR (NBS-LRR) proteins [21,22] and PGIPs [23-25]. 
They provide an early warning system for the presence of potential pathogens and activate protective 
immune signaling in plants [26-28]. In addition, they act as a signal amplifier in the case of tissue 
damage, establishing symbiotic relationships and effecting developmental processes. 

Evolution of plant, disease resistance (R) genes that encode an LRR region has been studied by 
many researchers [18,22,29-45]. The generations of R genes are proposed to be mainly due to gene 
duplication, genetic recombination, diversifying selection, sequence divergence in the intergenetic 
region, composition of the transposable elements, gene conversion, and unequal crossover [41-43]. 

Non-LRR, island regions (IRs) interrupting LRRs are widely distributed; they are referred to as 
"islands" or "loop outs" [46,47]. A large number of plant LRR proteins have non-LRR IRs which are 
called LRR@IR proteins; they include LRR-RLKs and LRR-RLPs [46-61]. Some experimental 
studies on the function of non-LRR IRs within LRR@IR proteins have been performed [62-64]. 
TLRs 7, 8, and 9 out of Toll-like receptors (TLRs) are also LRR@IR proteins [65-67]; TLRs initiate 
an innate immune response [68-71]. 

A method — LRRpred — identify the repeat number of LRRs and phasing (that is, what segment or 
residue corresponds to the beginning of a repeating unit) was developed, which incorporates protein 
secondary structure prediction [65,72]. LRRpred predicts the repeat number and phasing of LRRs to 
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be completely consistent with, or almost so, with those revealed by structure analyses [72]. 
Furthermore, to identify non-LRR IRs, a method (called LRR@IRpred) utilizing LRRpred was 
developed and used to find LRR@IR proteins from organisms other than plants [47]. The present 
article reviews 19 families of plant LRR@IR proteins identified by LRR@IRpred and describes some 
features of their LRR domains. The structure, function and evolution of the LRR domains as well as 
the LRR@IR proteins are discussed. 

2. Structures of Plant LRR Proteins 

All of the LRR domains in one protein form a single continuous structure and adopt an arc or 
horseshoe shape [73]. Three residues at positions 3 to 5 in the HCS, Lx xLx LxxNxL or 
Lx xLx LxxCxxL, form a short P-strand. On the inner, concave face there is a stack of the parallel 
P-strands and on the outer, convex face there are a variety of secondary structures such as a-helix, 
3io-helix, polyproline II helix, or a tandem arrangement of P-turns, which are connected by two loops. 
Most of the known LRR structures have caps, which shield the hydrophobic core of the first LRR unit 
at the TV-terminus and/or the last unit at the C-terminus. In extracellular proteins or extracellular 
regions, the TV-terminal and C-terminal caps frequently consist of Cys clusters including two or four 
Cys residues; the Cys clusters on the N- and C-terminal sides of the LRR arcs are called LRRNT and 
LRRCT, respectively [8-10]. 

The crystal structures of PS-LRR domains of Phaseolus vulgaris PGIP and A. thaliana BRI1 
(an LRR@IR protein) have been determined [15-17]. The structure of the BRI1 LRR domain forms a 
right-handed superhelix composed of 25 PS-LRRs (Figure 1A) [16,17]; most of these 25 PS-LRRs are 
24 residues long. The helix completes one full turn, with a rise of ~70 A. The concave surface is 
formed by a- and 3i 0 helices that produce inner and outer diameters of ~30 and ~60 A, respectively. 
The consensus sequence LxGx(LL)P at positions 11 to 16 likely forms a second P-strand, which 
characterizes the fold of the PS-LRRs. Thus, the structural LRR units may be represented by P-P-3io. 
BRI1 has both an LRRNT with Cx 6 C and an LRRCT with Cx 6 C; both the LRRNT and LRRCT form 
two disulfide bonds. The disulfide bonds contribute to the stability of the N-terminal cap structure 
(TV-Cap) consisting of one P-strand and two a-helices and the C-terminal cap structure (C-Cap) 
consisting of two short helices. 

The crystal structures of LRR domains of A. thaliana transport inhibitor response 1 (TIR1) and 
coronatine-insensitive protein 1 (COI1) (that are F-box proteins) are also available [74-76]. TIR1 has 
18 LRRs of various lengths (from 22 to 35 residues) of which 13 are noncanonical, imperfect LRRs 
and have long P-strands of 4-6 residues. Most VS parts adopt a-helix. Thus, the structural LRR units 
may be represented by P-a. The TIR1 LRR domain form a right-handed superhelix of one full turn, 
which is represented by one closed ring, as well as the BRI1 LRR domain [74,75]. The top surface of 
the TIR1 superhelix has three long intra-repeat loops (loop-2 in LRR2, loop-12 in LRR12 and loop-14 
in LRR14). The loop-2 plays a pivotal role in constructing the auxin- and substrate-binding surface 
pocket by interacting with the nearby concave surface of the TIR1 LRR structure. The COI1 LRR 
domain adopts a very similar structure to that of TIR1 [76]. Similarly, three long intra-repeat loops are 
involved in the bindings of hormone (jasmine) and polypeptide substrates [76]. 
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Figure 1. Three-dimensional structures of the PS-LRR domains of BRI1 and PGIP. 
(A) BRI1 [3RGZ]; (B) PGIP [10GQJ. The LRRs are colored blue, the cap structures at the 
N-terminal and C-terminal side orange, the non-LRR IR in BRI1 pink, and the disulfide 
bonds yellow. All figures were prepared with PYMOL. 
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3. Plant LRR@IR Proteins 

Plant LRR@IR proteins found through previous research by Matsushima et ah [47] and by use of 
keywords in the references are described. Homologs of an individual protein family from various plant 
species were collected by the following procedures. First, LRRs in a representative LRR@IR protein 
of each family were identified by LRR@IRpred; the number of repeat units in the preceding LRR 
block (Ni), its number in the following block (N2), and the non-LRR IR sequence of the LRR region 
were determined. Second, database searches using the amino acid sequences of the non-LRR IR and 
one LRR unit at the N-terminal and C-terminal IR region were performed by FASTA at the 
Bioinformatic Center, Institute for Chemical Research, Kyoto University on February 15, 2012. 
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Third, PS-LRR proteins with highly significant similarity (E-value < 10~ 10 ) were identified and then 
they were regarded as putative homologs in which the results of amino acid sequence alignments of 
full lengths and non-LRR IRs, and their domain architecture, were taken account of. Finally, LRRs in 
the homologs of each family were identified by LRR@IRpred. When a candidate region is not an LRR 
unit and its length is longer than average length of the repeating unit of LRRs, it was defined as a 
non-LRR IR. 

The following sequence analyses were also carried out: signal sequence analysis by the program 
SignalP (http://www.cbs.dtu.dk/services/SignalP/) [77], transmembrane predictions by TMHMM 
(http://www.cbs.dtu.dk/services/TMHMM/) [78], and the identification of other characteristic regions 
by SMART (http://smart.embl-heidelberg.de/smart/set_mode.cgi? GENOMIC = 1) [2]. 

Finally, the 19 families of 344 LRR@IR proteins are described (Supplementary Table SI). The 19 
families are grouped into LRR-RLKs, LRR-RLPs, and intracellular proteins. At least one protein in 
each family has clear experimental evidence for its existence or expression data (such as existence of 
cDNA(s), RT-PCR or Northern hybridizations) of the existence of a transcript. TMHMM predicts that 
A. thaliana RSYR1 and RPP27 contain a transmembrane region at the N-terminal side (Supplementary 
Table SI). However, orthology or domain structure was taken account of, and then these two proteins 
were regarded as LRR-RLKs. SignalP predicts no signal peptide in A. thaliana Atlg74360 and 
soybean putative disease resistance protein. Similarly, these proteins were regarded as an LRR-RLK 
and an LRR-RLP, respectively. 

LRR-RLKs count 165/233/239 proteins from A. thaliana, 292/357 proteins from O. sativa subst. 
Japonica (rice) and 440 from Popula trichocarpa (poplar) [42,79,80]. LRR-RLPs count 
90 LRR-RLPs from rice (O. sativa) and 48/56 from A. thaliana [42,46]. There are LRR-RLKs and 
LRR-RLPs having no non-LRR IRs, such as FLS2, Xa21, and TMM [81]. LRR- containing receptor-like 
cytoplasmic kinases (LRR-RLCKs) that lack an extracellular domain have no non-LRR IRs [79,82]. 

The present review could not describe all families of LRR@IR proteins in plants because of a 
limited survey of LRRs having non-LRR, IRs which comes from LRR@IRpred. 

3.1. Six Families of LRR-RLKs 

LRR-RLKs have an extracellular LRR region with an N-terminal signal peptide, a single 
transmembrane-spanning region, and an intracellular serine-threonine kinase region [18,19],. 
Transmembrane kinase 1 (TMK1), brassinosteroid insensitive 1 (BRI1), A. thaliana Atlg74360 
protein, phytosulfokine receptor (PSKR), tyrosine-sulfated glycopeptide receptor 1 (PSYR1), and 
LRR receptor-like serine/threonine-protein kinase RPK2 are members of the LRR-RLKs family. 
The LRR-RLKs are LRR@IR proteins in which the LRRs are intersected by a single non-LRR IR; 
only RPK2 has two IRs (Figure 2 and Table 1, and Supplementary Table SI and Figure SI). 
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Figure 2. Schematic representation of six LRR-RLKs having LRR domains intersected 
by non-LRR island regions. Arabidopsis thaliana TMK1 [TMK1ARATH]; A. thaliana 
BRI1 [BRI1ARATH]; Daucus carota PSKR [PSKRDAUCA] ; A. thaliana PSYR1 
[PSYR1ARATH]; A. thaliana Atlg74360 [Y1743 ARATH]; A. thaliana RPK2 
[RPK2ARATH] . 
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Table 1. Nineteen families of plant LRR proteins having LRR domains intersected by 
non-LRR island regions. a "AV is the repeat number of LRRs of the first LRR block in the 
homologs of each family. b "N2" is the repeat number of LRRs of the second LRR block in 
the homologs of each family. c "jVi/A/2" is average values. d The LRR domain in 
Arabidopsis RPK2 contains two non-LRR IRs. The number "13" is the sum of repeat 
number of LRRs of the first and second LRR blocks. The number "8" is the repeat number 
of the third LRR block. 
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The transcript concentration of O. sativa TMK1 increase in the rice internode in response to 
gibberellins [83]. Nicotiana tabacum TMK1 mRNA accumulation in leaves was stimulated by CaCb, 
methyl jasmonate, wounding, fungal elicitors, chitins, and chitosan [84]. TMK1 orthologs were 
identified from 14 plant species and its paralogs are present in 10 species, including A. thaliana, 
Glycine max, and O. sativa (Figure 2 and Table 1, and Supplementary Table SI and Figure SI). Also 
G.max Rhg4, which is a soybean cyst nematode resistance gene [85], was identified as a TMK1 
homolog; while G.max Rhgl [C9VZY3] contains 13 PS-LRRs of 24 residues in which only LRR6 is 
29 residues long. The TMK1 homologs contain 13 LRRs intercepted by a 56 to 76-residue, 
non-LRR IR. The number of repeat units in the preceding LRR block (Ni) is greater than the number 
of the following block (N2), which means Nj » N2 with N\ = 10 and N2 = 3. The non-LRR IRs have a 
cluster of four Cys residues with the pattern of Cx 6 - 7 Cx29-3oCx 6 -uC and a conserved motif of 
Lx 8 Yx 7 _ 8 WxG where "Y" is Tyr or Phe, "W" is Trp, and "G" is Gly; this motif is similar to Yx 8 KG 
found in many LRR-RLPs [46]. An LRRNT (with Cx 6 C) is observed, but not an LRRCT. Putative 
C-Cap regions are rich in Gly, Ser, and Pro residues. 

BRI1/SR160 is a receptor complex for brassinosteroids that are necessary for plant development, 
including expression of light- and stress-regulated genes, promotion of cell elongation, normal leaf and 
chloroplast senescence, and flowering [86-92]. BRI1 orthologs were identified from 24 species and its 
paralogs are also present in 10 species. The BRI1/SR160 homologs contain 21-26 LRRs with a single 
non-LRR IR. The Nj value is relatively variable among species and is 10-22, while N2 = 4; Nj » N2. 
(Figure 2 and Table 1, and Supplementary Table SI and Figure SI). A. thaliana BRI1 contains 25 
LRRs interrupted by a 70-residue IR between LRR21 and LRR22. The non-LRR IR, together with 
LRR22, binds brassinosteroids [62]. The non-LRR IRs of the BRI1 homologs are 68-70 residues long 
and have a cysteine cluster of Cx 2 5-26C and have a conserved motif of R(I/V/M/L)Y. An LRRNT (with 
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CxgQ and an LRRCT (with Cx^C) were observed; only soybean BR [C6ZRS8] and Ricinus communis 
LRR-RLK [B9T4K2] have LRRNTs with Cx 26 - 27 C. The LRRCT regions are rich in His, Arg, and Lys 
residues, and thus are basic. 

PSKR is a PSK receptor that regulates, in response to PSK binding, a signaling cascade involved in 
plant cell differentiation, organogenesis, and somatic embryogenesis [55,63,93,94]. PSKR orthologs 
and paralogs were identified (Figure 2 and Table 1, and Supplementary Table SI and Figure SI). The 
PSKR homologs contain LRRs with a 36 to 38-residue, non-LRR IR. Nj = 17 - 18 and N 2 = 4 
(Figure 2 and Table 1, and Supplementary Table SI and Figure SI). The non-LRR IRs have a 
conserved motif of (Y/F)x 5 -i 2 Yx 5 F. Most LRRCT regions are basic. Daucus carota PSKR contains 
22 LRRs intersected by a 36-residue IR between LRR17 and LRR18. An LRRNT (with Cx 33 CCx 6 C) 
that is similar to that in PGIP [15] and LRRCT (with CxgC) are observed. A 15-residue region within 
the non-LRR IR is a binding site of PSK [63]. The corresponding regions in the homologs are 
relatively variable. 

A. thaliana RSYR1 regulates, in response to tyrosine-sulfated glycopeptide binding, a signaling 
cascade involved in cellular proliferation and plant growth [95]. The RSYR1 homologs from seven 
species contain 21-22 LRRs with a 37-residue, non-LRR IR {Nj = 17 - 18 and N 2 = 4) (Figure 2 and 
Table 1, and Supplementary Table SI and Figure SI). The non-LRR IRs have a conserved motif of 
Yx 2 LPVFx 4 Nx4Qx2-3QLSxL. The LRRNT (with four, five, or seven Cys residues) and the LRRCT 
(with CxjC) are observed. The LRRCT regions are basic. 

A. thaliana Atlg74360 is a BRI1 -related protein (Figure 2 and Table 1, and Supplementary Table 
SI and Figures SI). Putative orthologs and paralogs were identified from 10 species. The Atlg74360 
family contains 21-22 LRRs with a single IR. The Nj value is relatively conserved among species; 
Ni = 16 - 17, while not N 2 = 4 but N 2 = 5. The non-LRR IRs of 76-residue are longer than those in 
BRI1 and have a cysteine cluster with the pattern of Cx^CxuC. The IRs are highly conserved among 
the homologs. 

A. thaliana RPK2 is a key regulator of anther development (e.g., lignifications pattern), including 
tapetum degradation during pollen maturation (e.g., germination capacity) [96-98] and contributes to 
shoot aptical meristerm homeostasis [99,100]. The RPK2 homologs from Arabidopsis lyrata subsp. 
Lyrata, Populus trichocarpa, and R. communis contain 21-22 LRRs with two non-LRR IRs. The first 
IR is between LRR9 and LRR10. The second IR is between LRR13 and LRR14 (Figure 2 and Table 1, 
and Supplementary Table SI and Figure SI). The second IRs are highly conserved among homologs. 
There are an LRRNT (with Cx^Cx^Cx^C) and an LRRCT (with CxuC). The LRRCT region is rich in 
Ser and Pro residues. Sawa and Tabata [101] have reported the RPK2 homologs from other plant 
species-Mwsa acuminate, O.sativa Japonica Group, Vitis vinifera, Sorghum bicolor, Physcomitrella 
patens, and Marchantia polymorpha. 

3.2. Eleven Families ofLRR-RLPs 

LRR-RLPs have a short cytoplasmic tail instead of the kinase region in LRR-RLKs (Figure 3) [20]. 
LRR-RLPs are involved both in resistance of plant-pathogen interactions and development [34,102]. 
Tomato Cf genes confer resistance to the fungal pathogen Cladosporium fulvum [43,56,103,104]. 
Tomato Verticillium wilt disease resistance gens (Vel) and Ve2, apple HcrVf2, Arabidopsis RPP27 are 
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involved in resistance to Verticillium, Venturia, and Peronospora, respectively [105-107]. 
Furthermore, the tomato LeEIX initiates defense responses upon elicitation with a fungal 
ethylene-inducing xylanase (EIX) of non-pathogenic Trichoderma from tomato that confer resistance 
against the fungal pathogen Cladosporium fulvum [108,109]. The clavata.2 (CLV2) functions in both 
shoot and root meristems of Arabidopsis [58,1 10-1 12] and also affects autoregulation of nodulation of 
pea and Lotus japonicus [113,114]. Zea mays fascinated ear 2 is involved in meristem 
development [59]. A. thaliana RLP2 is involved in the perception of CLV3 and CLV3-like peptides, 
that act as extracellular signals regulating meristems maintenance [64]. The LRR-RLPs are all 
LRR@IR proteins in which the LRRs are intersected by a single non-LRR IR (Figure 3 and Table 1 , 
and Supplementary Table SI and Figure SI). 

Figure 3. Schematic representation of 1 1 LRR-RLPs having LRR domains intersected by 
non-LRR island regions. Currant tomato Cf-9 [Q40235I]; Currant tomato Cf-2.1 [Q41397]; 
Tomato Vel [Q94G61]; Apple HcrVfl [Q949G9]; A. thaliana RPP27 [Q70CT4]; Tomato 
EIX1 [Q6JN47]; A. thaliana CLV2 [Q9SPE9]; Maize fascinated ear2 [Q940E8]; 
A. thaliana RLP2 [RLP2 ARATH] ; Oryza sativa Osl0g0469700 [Q337L7]; Soybean 
disease resistance protein [C6ZS07]. 
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Tomato Cf-9/Cf-4 homologs were identified from six species. Elicitor-inducible LRR 
receptor-like protein (EILP) from N. tabacum [115] was identified as ortholog of tomato Cf-9/Cf-4. 
The number of N\ is 18 to 22, while Ni keeps 4, and the non-LRR IRs are 40-44 residues long (Figure 
3 and Table 1, and Supplementary Table SI and Figure SI) and have a conserved motif of 
MKx 3 Ex 6 Yx5-8Yx7TKG in which hydrophilic residues are conserved. The EILP protein also contains 
27 LRRs with N\ = 23 and Ni = 4. Most of the homologs have LRRNT consisting of six Cys residues 
with the pattern of CX24-29CXB-23CCX6CX12-13C. However, peru 1 and peru 2 have an LRRNT of four 
Cys's with the pattern of Cx^CCxfjC [116]. The C-terminal side of the LRRCT is rich in Glu and Asp 
residues and thus is acidic. 

Tomato Cf-2/Cf-5 homologs were identified from two species {Lycopersicon esculentum, and 
L. pimpinellifolium). The number of Ni is highly variable; N\ = 20 - 33, while N2 keeps 4, and the 
non-LRR IRs are 37-41 residues long. The IRs are hydrophilic. The variability of Ni has been reported 
by other researches in between the paralogs and orthologs [43,46,103,104] (Figure 3 and Table 1, and 
Supplementary Table SI and Figure SI). Interestingly, the N-terminal LRRs include tandem repeats of 
the super-motif of two highly conserved LRRs; for example, LxxLxLxxNxLSGxIPxxIGYLRS and 
LxxLxLSxNxLNGxIPxxFGxLxN in currant tomato Cf-2.1 [103]. 

Tomato Ve orthologs and paralogs were identified from twelve species including Solarium neorickii, 
S. aethiopicum, Mentha longifolia, and M. spicata [105,117,118]. The Ve homologs contain 32-34 
LRRs intercepted by a 44 to 49-residue, non-LRR IR with N\ = 28 - 30 and N 2 = 4 (Figure 3 and Table 
1, and Supplementary Table SI and Figure SI). The non-LRR IRs have a conserved motif of 
YYx 8 K(G/R) and are relatively hydrophilic. 

Apple HcrVfs (Homologs of Cladosporium fulvum resistance genes of Vf region) are scab resistance 
genes [119,120]. Mentha longifolia HcrVfs are orthologs of tomato Ve genes [105,117,118]. The HcrVfs 
paralogs contain 32-34 LRRs intercepted by a 41 to 46-residue, non-LRR IR with 
N\ = 22 - 28 and N2 = 4 (Figure 3 and Table 1, and Supplementary Table SI and Figure SI). The 
non-LRR IRs have a conserved motif of VTKGxExE Yx(K/E)ILxFxKxxDL S CNF in which 
hydrophilic residues are conserved. The C-terminal side of the LRRCT is rich in Gly and Pro residues. 

A. thaliana RPP27 homologs were also identified from A. lyrata. The LRR@IR proteins contain 
16-30 LRRs intercepted by a 61 to 71 -residue, non-LRR IR with Ni = 1 1 - 26 and N 2 = 4 (Figure 2 
and Table 1, and Supplementary Table SI and Figure SI). The IRs have a conserved motif of 
FxxKxRYD. The C-terminal side of most LRRCT regions is acidic. 

Tomato LeEIXl and LeEIX2 contain 31 LRRs intercepted by a 47 to 49-residue, non-LRR IR with 
N\ = 27 and N2 = 4 (Figure 3 and Table 1, and Supplementary Table SI and Figure SI). 
The C-terminal side of the LRRCT is acidic. 

A. thaliana CLV2 homologs were identified from 11 species. The CLV2 homologous proteins 
contain 22 LRRs intercepted by a 41 to 43-residue, non-LRR IR with M = 18 and N2 = 4 (Figure 3 and 
Table 1, and Supplementary Table SI and Figure SI). The IRs have a conserved motif of LxFxYxL. 
The C-terminal side of most LRRCT regions is acidic. A. thaliana CLV1 is an LRR-RLP but not 
LRR@IR protein. 

Z. mays fascinated ear2 is an ortholog of Arabidopsis CLV2. The homologs were also identified 
from O. sativa subsp. Japonica, and indica, and 5*. bicolor. The fascinated ear2 homologous proteins 
contain 17-18 LRRs intercepted by a 41 to 42-residue, IR with N\ = 10 - 14 and N 2 = 4 (Figure 3 and 
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Table 1, and Supplementary Table SI and Figure SI). The IRs and the LRRCT regions are rich in Gly. 
Both regions may be flexible. 

A. thaliana RLP2 contains 23 LRRs that are intercepted by a 44-residue, IR with N\ = 18 and 
N2 = 4 (Figure 3 and Table 1, and Supplementary Table SI and Figure SI). There are an LRRNT and 
an LRRCT. The extracellular region including the 23 LRRs is homologous to that in A. thaliana 
PSYR1 [121]. 

O. sativa Osl0g0469700 is an LRR@IR protein; the function is unknown (Figure 3 and Table 1, 
and Supplementary Table SI and Figure SI). The homologs from four species contain 10 LRRs with a 
single IR with N\ = 6 and N2 = 4. The non-LRR IRs with 39^10 residues is represented by 
MKxP(K/E)IxSSx 2 -3LDGSxYQDRIDIxWKGx3FQx 4 L. 

A putative disease resistance protein from soybean [C6ZS07] is an LRR@IR protein (Figure 3 and 
Table 1, and Supplementary Table SI and Figure SI). The homologs were identified from four species 
and contain 8-32 LRRs with a single IR with Ni = 4 — 28 and N2 = 4. The Ni number is highly variable 
in both the paralogs and orthologs. The IRs have a conserved motif of Yx 2 Sx 5 Kx7(R/K)I. 

3.3. Two Families of Plant Intracellular Proteins 

A. thaliana TONSOKU(TSK)/MGOUN3(MG03)/BRUSHYl(BRUl), which is localized in the 
nucleus and is preferentially expressed in the shoot apex than in the leaves and stems, is required for 
cell arrangement in root and shoot apical meristems and involved in structural and functional 
stabilization of chromatin [122-124]. The TONSOKU protein may represent a link between response 
to DNA damage and epigenetic gene silencing [125]. 

Potential homologs of A thaliana TONSOKU have been identified in eight species. The UniProKB 
database describes that A. thaliana TONSOKU contains three LRRs and eight TPRs, while the data 
bases - InterPro, Gene3D, SMART and PROSITE-identify only TPR. LRR@IRpred identifies 14 
LRRs with a single IR; Nj = 13, N2 = 1 (Figure 4 and Table 1, and Supplementary Table SI and Figure 
SI) [47]. The LRRs are not "plant-specific" motifs but presumably "Rl-like" motifs. Thus, the 
structural LRR units may be represented by P-a instead of P-P-3io. The LRR domain is predicted to 
adopt a typical horseshoe shape seen in ribonuclease inhibitor [126]. The non-LRR IRs are 70-131 
residues long and are rich in Ser and Gly. The IRs may be unstructured or flexible. 

A. thaliana MJK13.7 is considered to be intracellular protein. The function is unknown. A. thaliana 
MJK13.7 homologs were identified from 11 species. The homologs contain 20 LRRs intersected by a 
single IR; Nj = 12, N2 = 8 (Figure 4 and Table 1, and Supplementary Table SI and Figure SI). All of 
the non-LRR IRs are 60-62 residues long and have conserved Lys residues at five positions. The 
consensus of the LRRs is LxxLxLxxNxLxxLPxxLxxLxx of 23 residues that are present in many 
proteins from bacteria to human (data not shown). The LRR motif does not belong to PS-LRR and the 
structure of the LRR domain is not available. However, the LRR motifs are contained in part of the 
LRR domains in toll-like receptor 1 (TLR1) and glycoprotein Iba (Gplba) of which the crystal 
structures are available [127-130]. Four LRRs are IKVLDLHSNKI KSIPKQWKLEA and 
LQELNVASNQL_KSVPDGIFDRLTS in TLR1, and LGTLDLSHNQL QSLPLLGQTLPA and 
LDTLLLQENSL YTIPKGFFGSHL in Gplbae. The structures revealed that the LRRs may be 
characterized by extended conformations at the bold sequences [127-130]. 
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Figure 4. Schematic representation of two plant intracellular LRR@IR proteins having 
LRR domains intersected by non-LRR island regions. A. thaliana MJK13.7 [Q9M7W9]; 
A. thaliana TONSOKU [Q6Q4D0]. 
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Moreover, A. thaliana MJK13.7 forms a family with its homologs from insect species, 
Strongylocentrotus purpuratus, Nematostella vectensis, and Paramecium tetraurelia and LRRC40 
from vertebrates species [47]. The S. purpuratus protein has 163 residues containing two repeats of 
64 residues each [47]. 



3.4. An NBS-LRR Protein 



Rice blast resistance gene Pi-ta encodes an NBS-LRR protein with 928 residues [44,45]. The Pi-ta 
protein [Q9AY26] lacks a canonical LRR [44]. The C-terminal region contains highly imperfect LRRs 
with 10 repeats of various lengths (from 16 to 75 residues) based on the consensus LxxLxxL. The Pi-ta 
protein appears to be an LRR@IR protein. LRR@IRpred predicts 13 LRRs of 20-54 residues with one 
non-LRR, IR between LRR6 and LRR7 (Supplementary Figure SI). The secondary structure 
prediction prefers a-helix in the VS's. The Pi-ta LRR domain might adopt a similar structure to those 
of TIR1 and COI1 [74-76]. 

4. Features, Structure, Function, and Evolution of the LRR Domains in Plant LRR@IR Proteins 

4.1. Fundamental Features 

Most plant LRR@IR proteins that are LRR-RLKs or LRR-RLPs keep the rule of N 2 » N 2 ; N 2 = 10 - 30 
and N 2 = 3 - 5 (Table 1). The same rule of Ni » N 2 is observed in other LRR@IR proteins of toll 
receptors and toll-related proteins from insect species, that have one single transmembrane-spanning 
region and an intracellular Toll IL-receptor (TIR) domain as well as TLRs instead of the kinase region 
in LRR-RLKs [131]. Most toll receptors and toll-related proteins contain 21-30 LRRs interrupted by a 
single non-LRR IRs of 81-120 residues with Ni » N 2 ; Ni = \l - 24 and N 2 = 4 - 6 (data not shown). 
Fritz-Laylin et ah [46] have performed sequence analysis of 90 LRR-RLPs of rice (O. sativa) and 
56 Arabidopsis {A. thaliana). Many LRR-RLPs contain 18-28 LRRs intercepted by a 30 to 80-residue, 
single IR with N\ » N 2 ; Ni = 14 - 24 and N 2 = 4 [46]. 

The non-LRR IRs in plant LRR@IR proteins may be classified into two groups; one group is 
non-LRR IRs having cysteine clusters, while the other has no cysteine clusters. The IR cysteine 
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clusters are characterized by CX6-7CX29-30CX7-11C in A. thaliana TMK1 homologs, CX25C in BRI1 
homologs, and Cx^Cx^C in Atlg74360 homologs. The other non-LRR IRs frequently have a 
conserved motif of Yx 8 KG which are observed in the homologs of A. thaliana TMK1, tomato 
Cf-9/Cf-4, tomato Cf-2/Cf-5, tomato Ve, M. longifolia HCrVf, A. thaliana CLV2, and Z. mays 
fascinated ear2, and O. sativa Osl0g0469700. Non-LRR IRs in many LRR-RLPs from Arbidopsis and 
rice contain a conserved motif of Yx 8 KG [46]. 

Most of the LRRNTs consist of two, four, or six Cys residues of which the patterns are CX6-7C, 
CX23-34CCX6C, and CX24-29CX13-23CCX6CX12-13C. They probably form one, two, and three disulfide 
bonds, respectively. The LRRCTs consist of two Cys's with the pattern of CX4-29C which probably 
form one disulfide bond (Supplementary Table SI and Figure SI). The disulfide bonds should 
contribute to the structural stabilization of the N-terminal and C-terminal caps. 

4.2. Possible Structures 

The structure of a non-LRR IR is available in A. thaliana BRI1 (Figure 1A). The BRI1 LRR 
domain forms a superhelix with 25 LRRs. The 70-residue, non-LRR, IR in BRI1 between LRR21 and 
LRR22 forms a small domain that folds back into the interior of the superhelix, where it makes 
extensive polar and hydrophobic interactions with LRRs 13-25 [16,17]. The LRR domain fold is 
characterized by an anti-parallel [3-sheet, which is sandwiched between the LRR core and a 3 10 helix 
and stabilized by a disulphide bridge of the Cys cluster with CX25C in the non-LRR, IR. Cys clusters 
are also present in non-LRR, IRs in the homologs of TMK, Atlg74360 and TONSOKU. Thus, the 
non-LRR IRs may adopt similar structures with disulfide bridges. All of the non-LRR IRs would fold 
back into the interior or exterior of a superhelix of the LRR domains. 

4.3. Possible Function(s) 

The non-LRR IRs of BRI1 and PSKR participate in ligand/protein-protein interactions. The BRI1 
non-LRR IR binds brassinosteroids [62]. The insertion of a folded domain into the LRR repeat is 
probably an adaptation to the challenge of sensing a small steroid ligand [16]. The PSKR non-LRR IR 
also binds PSK [63]. The non-LRR IRs in TLRs 7, 8, and 9 was also predicted to contribute to nucleic 
acid-protein interaction [66,132]. 

The non-LRR IRs in plant LRR@IR proteins have frequently conserved motifs that are 
characterized by hydrophilic residues such as Lys, Arg, Glu and Asp, as noted. Some non-LRR IRs are 
presumably flexible. The conservation of hydrophilic residues in the IRs is also observed in the 
respective families of LRRC40, LRRC9, and C. elegans LRK-1 which are LRR@IR proteins from 
organisms including vertebrate other than plants [47]. The IRs might contribute to 
ligand/protein-protein interactions [47]. Moreover, Afzals et al. [133] suggested, based on circular 
dichroism data, that non-LRR IRs are intrinsically unstructured, providing binding diversity to 
the domains. 

The first LRR block in tomato Cf-9, Cf-4, and Cf-2 recognize fungal avirulence proteins [134-138]. 
The recognitional specificity of Cf-2 with 37 LRRs lies between leucine-rich repeat LRR3 and LRR27, 
a region that differs from Cf-5 with 31 LRRs by six extra LRR and 78 amino acid substitutions [134]. 
Although crudely defined, this region of specificity corresponds to those in Cf-4, Cf-9, and Cf-9B 
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responsible for recognition of their cognate ligands [135-138]. Biochemical studies show that CLV2 is 
essential for the stability of CLV1, in which CLV1 and CLV2 may form a disulfide-linked 
heterodimer of 185 kD [58]; CLV1 is an LRR-RLP having no non-LRR IR. 

Drosophila Toll and vertebrate TLRs 7, 8, and 9 are LRR@IR proteins [65-67] which contain one 
single transmembrane-spanning region as well as LRR-RLKs and LRK-RLPs from plant. Homo- or 
heterodimerization are involved in ligand- interactions of vertebrate TLRs [68-71]. A model for 
Drosophila Toll activation by ligand Spatzle has been proposed; the first LRR block interacts with 
Spatzle and the second LRR block forms strong dimer contacts that are prevented by the first block, 
which in the absence of ligand provides a steric constraint [67,131]. The BRI1 receptor activation 
involves homodimerization [139]; although Hothorn et ah, [16] suggested that the superhelical BRI1 
LRR domain alone has no tendency to oligomerize, indicating that BRI1 receptor activation may not 
be mediated by ligand-induced homodimerization of the ectodomain. 

Taken together, non-LRR IRs in plant LRR@IR proteins might participate in ligand/protein- 
interactions, dimerization or both, although an LRR-RLP, A. thaliana CLV2, remains functional without 
non-LRR IR, while the first and the second LRR blocks are essential for functionality [64]. Nj » N2 
brings close proximity of the non-LRR IRs to interact with ligand/protein and a transmembrane region. 
A; » N2 might facilitate signaling in the cytoplasm through the ligand/protein- interactions. 

There is a possibility that Cys residues in LRRs are involved in dimerization of LRR@IR proteins. The 
conserved hydrophobic residues of the PS-LRR consensus sequence of LxxLxLxxNxLSGxIPxxLxxLxx at 
positions 1, 4, 6, 11, 15, 19, and 22 contribute to the hydrophobic cores in the LRR arcs [8,9]. 
The conserved hydrophobic residues at positions 1, 19 and 22, and "N" at position 9, are frequently 
occupied by Cys in the PS-LRRs. Moreover, Cys residues are frequently observed in noncanonical 
PS-LRRs which, as examples, are longer LRR motifs of 25-30 residues with the consensus of 
LxxLxLxxNxLSGxIPxxLCxxxxx(x/-)(x/-)(x/-)(x/-)(x/-), in which "-" indicates a possible deletion site. 
At the present stage it remains unknown whether the Cys residues contribute to the hydrophobic core 
of the LRR arcs or are exposed to solvent. However, some LRR@IR proteins contain PS-LRRs having 
Cys at positions 2, 3, or 5 in the HCS part (Supplementary Table SI). The Cys residues are likely to be 
exposed to solvent in the LRR arc and thus might induce dimerization. 

4.4. Implications for Evolution 

What is the evolutionary origin of non-LRR IRs interrupting LRRs? Previous research provided 
evidence that a direct duplication of the super motifs containing non-LRR regions naturally leads to 
the occurrence of non-LRR IRs in LRR@IR proteins, including LRR-containing 17 protein 
(LRRC17), LRRC32, LRR33, chondroadherin-like protein, trophoblast glycoprotein precursor, and 
Leishmania proteophosphoglycans, not from plants but from other eukaryotes [47]. The non-LRR IRs 
in plant LRR@IR proteins might originate from such similar events. 

The tomato Cf-2/Cf-5 homologs have PS-LRRs that include tandem repeats of the super-motif of 
two highly conserved LRRs, as noted [103]. The duplications of the super-motif were suggested to 
have occurred in the Cf-2/Cf-5 homologs [43]. Super-motifs of LRRs are observed in many LRR 
proteins. The SLRP subfamily (biglycan, decorin, asporin, lumican, fibromodulin, PRELP, keratocan, 
osteoadherin, epiphycan, osteoglycin, opticin, and podocan), the TLR7 family (TLR7, TLR8 and 
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TLR9), the FLRT family (FLRT1, FLRT2, and FLRT3), and OMGP [65,140,141] contain tandem 
repeats of a super-domain of STT, where "7" is "typical" LRR and "ST is "Bacterial" LRR. 
Ribonuclease inhibitor also has RI- LRRs consisting of a super-motif of 57 residues that encode two 
LRRs [142]. The super-repeats as well as Cf-2/Cf-5 have been contributed to the duplication of their 
super-motifs. 

5. Evolution of Plant LRR@IR Proteins 

A large number of LRR-RLPs resembling the extracellular domains of LRR-RLKs are found in the 
Arabidopsis genome; although not all RLK subfamilies have corresponding RLPs [121]. Indeed, the 
present analysis indicates that the extracellular domain in PSYR1 is highly similar to that in RLP2. 
The same distributions also occur in LRR@IR proteins from other plants, such as 5*. bicolor and 
O. sativa (Supplementary Figure S2). Here four examples are described: Sbl0g028170/Sbl0g028210 
(LRR-RLK/LRR-RLP), and Os06g0691800/Os06g0692700; all the four proteins contain 22 LRRs 
intersected by a single non-LRR IR of 33 residues with Nj = 18 and N2 = 4. The others are 
Os07g0597200/Os03g0400850, and OsI_26735/OsI_11946; the LRR-RLKs-Os07g0597200 and 
OsI_26735 are homologs of Arabidopsis Atlg74360. The pair- wise comparisons of the amino acid 
sequences exceed 50% of the identity in respective pairs. The above observations indicate that the 
LRR-RLKs and LRR-RLPs evolved from gene duplications and recombination [39]. 

Two putative uncharacterized proteins from Z. mays with 717 residues [B8A2X8] and with 
623 residues [B8A383 MAIZE] are paralogs of Z. mays TMK1 with 958 residues (Supplementary 
Figure S2). The 717-residue protein contains 6 LRRs; Ni = 3 and N2 = 3. There are other examples; 
a hypothetical protein from Z mays with 247 residues [C0PL86] and fasciated ear2 with 613 residues, 
O. sativa Os02g0782800 with 441 residues [Q6K7E5] and BRUSHY 1 with 1,332 residues [Q6K7D3]. 
The occurrence of these proteins is attributed to gene duplication and deletions. 

6. Conclusions 

Most plant LRR@IR proteins have LRRs intersected a single IR with TV; » N2 in which Ni is variable 
in their individual homologs, while N2 is highly conserved. For all known LRR-RLPs, Nj = 4. The rule 
of Nj » N2 plays a common, significant role in ligand-interaction, dimerization, and/or signal 
transduction of the LRR-RLKs and the LRR-RLPs. All of the LRR domains consisting of PS- LRRs 
are predicted to form a superhelix and non-LRR IRs in plant LRR@IR proteins fold back into the 
interior or exterior of the superhelix. The present analyses suggest that some LRR-RLKs and 
LRR-RLPs evolved from gene duplications and recombination. The present review will stimulate 
various experimental studies to understand the structure and evolution of the LRR domains with 
non-LRR IRs and their proteins. 
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